Quandl provides a powerful API for economics and finance specilists. It is a one place for all solution for those who are interested both in publiclky available macroeconomic, financial and other data and for those who are ready to pay for Bloomberg-and-Reuters-like company data. The API can be accessed even from Excel. It has libraries for several programming languages including R and Python. The package for Python is called Quandl and and can be installed using pip as usuallly:
pip install Quandl
It is worth mentioning that Quandl also provides unauthorized access to several databases in CSV, JSON and XML formats. As an example, one may consider getting GDP (Gross Domestic Product) data from FRED (Federal Researve Economic Data) using the following links:
*Note: the only difference between this links is the last component: file dimension.
Yet, using authorized access to Quandl trough the abovementioned library provides some user-frnedly features, includign the fact that by default most data is received into already well-known Pandas dataframes (thus, we can use the head() function, drop or fill missing values, plot etc.).
In [3]:
import quandl
Let's now get the same data trough the library and using our API key for authorized access.
In [8]:
US_GDP_data = quandl.get("FRED/GDP",authtoken = 'somesecrettokenhere')
In [9]:
US_GDP_data.head()
Out[9]:
As you can see, the dates were directly saved as an index (row name) which simplifies our life: we can choose whatever period we want by indexing it. Still, the get() function can receive more arguments including the trim_start and trim_end to provide data of specific period.
In [11]:
US_GDP_data_2016 = quandl.get("FRED/GDP",authtoken = 'somesecrettokenhere',
trim_start="2016-1-1", trim_end="2016-12-31")
US_GDP_data_2016.head()
Out[11]:
One may specify even the frequency: e.g. annual, monthly, quarterly (of course, if the data is available at that level).
In [12]:
US_GDP_data_2000s = quandl.get("FRED/GDP",authtoken = 'somesecrettokenhere',
trim_start="2000-1-1", collapse='annual')
US_GDP_data_2000s.head()
Out[12]:
Another advantage of using the authorized access is that one cat get a data for the same period on different indicators into the the same dataframe. Let's get the GDP, Unemployment rate, Interest rate and CPI (Consumer Price Index) for Urban customers data from the FRED database on Quandl.
In [13]:
data = quandl.get(["FRED/GDP","FRED/UNRATE", "FRED/FEDFUNDS", "FRED/CPIAUCSL"],
authtoken = 'somesecrettokenhere',
trim_start="2000-1-1", trim_end="2016-12-31", collapse = 'annual')
In [15]:
data.columns = ['GDP','Unemployment', 'Interest_rate', 'CPI_Urban']
In [16]:
data.head()
Out[16]:
We can have a bit of statistical analysis here. Let's check the correlation between those 4 variables by generating a correlation matrix. That can again easily be done using a corr() function from Pandas that is available for DataFrames.
In [17]:
data.corr()
Out[17]:
It can be inferred that the increase in Interest rate is accompained by decrease in everything else (negative correlation). The strongest among them is the correlation with Unemployment rate: increase in interest rates is accompained by decrease un unemployment and hence increase in employment rates.
Note: take attention to the term accompained that was used in the paragraph above. One cannot claim that interest rates drive the increasse or decrease, as the correlation does not imply causation.
The overall strongest correlation is the one between urban CPI and GDP (a bit less than full 100% correlation). Conclusions are left upon the reader.
We can use matplotlib.pyplot to plot the GDP trend line.
In [21]:
import matplotlib.pyplot as plt
%matplotlib inline
data["GDP"].plot()
plt.show()